Skip to content

Conversation

@kareemshaik80
Copy link
Collaborator

@kareemshaik80 kareemshaik80 commented Oct 30, 2025

  • restructure moe kernels folder
  • add prepare moe inputs kerels
    • compute_problem_sizes
    • compute_expert_offsets
    • compute_expert_blockscale_offsets
    • compute_arg_sorts
    • ShuffleRows
    • ApplyShuffleMulSum

 - restructure moe kernels folder
 - add prepare moe inputs kerel

Signed-off-by: kareem <[email protected]>
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and add prepare inputs/meta kernel Restructure MoE and add prepare inputs/meta kernel [wip] Oct 30, 2025
Signed-off-by: kareem <[email protected]>
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and add prepare inputs/meta kernel [wip] Restructure MoE and add routing kernel [wip] Oct 30, 2025
kareemshaik80 and others added 5 commits November 3, 2025 08:11
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and add routing kernel [wip] Restructure MoE and Add prepare input kernels Nov 10, 2025
@kareemshaik80 kareemshaik80 changed the title Restructure MoE and Add prepare input kernels Restructure MoE and Add MoE prepare input kernels Nov 10, 2025
Signed-off-by: kareem <[email protected]>
Signed-off-by: kareem <[email protected]>
Copy link

@adityachatter adityachatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@airMeng airMeng added the run-ci label Nov 11, 2025
Copy link

@msinnha1 msinnha1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One generic comment also to use error handling in the functions:

example, in void prepare_moe_input()
TORCH_CHECK(topk_ids.dtype() == torch::kInt32, "topk_ids must be int32");
...

@kareemshaik80 kareemshaik80 changed the title Restructure MoE and Add MoE prepare input kernels Add MoE prepare input kernels Dec 10, 2025
Signed-off-by: Shaik, Kareem M <[email protected]>
Signed-off-by: Shaik, Kareem M <[email protected]>
@airMeng
Copy link
Collaborator

airMeng commented Dec 10, 2025

@kareemshaik80 please rebase with the latest main

@airMeng airMeng requested a review from mingfeima December 11, 2025 02:13
Copy link
Collaborator

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally follow SGLang CUDA kernel, LGTM except some minor comments

Copy link
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good job for this one!

just some minor places to change, and it shall be fine.

@mingfeima
Copy link
Collaborator

@kareemshaik80 could you please also collect how much is the ratio of shuffling activations accounts in the whole kernel for our benchmarks? i want to understand the overhead this creates for MoE layers.

Signed-off-by: Shaik, Kareem M <[email protected]>
Copy link
Collaborator

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Shaik, Kareem M <[email protected]>
@mingfeima mingfeima merged commit ac9e2a7 into sgl-project:main Dec 12, 2025
2 of 3 checks passed
@mingfeima
Copy link
Collaborator

@kareemshaik80 could you please also collect how much is the ratio of shuffling activations accounts in the whole kernel for our benchmarks? i want to understand the overhead this creates for MoE layers.

@kareemshaik80 good work here! please continue to analysis how much overhead the shuffle takes, you can share data internally.

airMeng added a commit that referenced this pull request Dec 12, 2025
airMeng added a commit that referenced this pull request Dec 12, 2025
kareemshaik80 added a commit to kareemshaik80/sgl-kernel-xpu that referenced this pull request Dec 12, 2025
airMeng pushed a commit that referenced this pull request Dec 12, 2025
* Revert "Revert "Add MoE prepare input kernels (#29)" (#57)"

This reverts commit eb9cfca.

Signed-off-by: Shaik, Kareem M <[email protected]>
sspintel pushed a commit to sspintel/sgl-kernel-xpu that referenced this pull request Jan 2, 2026
* Restructure MoE and add prepare inputs/meta kernel
 - restructure moe kernels folder
 - add prepare moe inputs kerel

Signed-off-by: kareem <[email protected]>

* fix minor issues

Signed-off-by: kareem <[email protected]>

* Add tests

Signed-off-by: kareem <[email protected]>

* Add shuffle_rows Kernel

Signed-off-by: kareem <[email protected]>

* register shuffle_rows

Signed-off-by: kareem <[email protected]>

* Enable Build and Add apply_shuffle_mul_sum kernel

Signed-off-by: kareem <[email protected]>

* functional

Signed-off-by: Shaik, Kareem M <[email protected]>

* cleanup

Signed-off-by: kareem <[email protected]>

* cleanup1

Signed-off-by: kareem <[email protected]>

* Modify fused expert to invoke moe_kernels and increase test coverage

Signed-off-by: Shaik, Kareem M <[email protected]>

* Cleanup makefile

Signed-off-by: Shaik, Kareem M <[email protected]>

* remove debug code

Signed-off-by: Shaik, Kareem M <[email protected]>

* fix lint

Signed-off-by: Shaik, Kareem M <[email protected]>

* Fix review comments

Signed-off-by: Shaik, Kareem M <[email protected]>

* Add to CI

Signed-off-by: Shaik, Kareem M <[email protected]>

---------

Signed-off-by: kareem <[email protected]>
Signed-off-by: Shaik, Kareem M <[email protected]>
sspintel pushed a commit to sspintel/sgl-kernel-xpu that referenced this pull request Jan 2, 2026
sspintel pushed a commit to sspintel/sgl-kernel-xpu that referenced this pull request Jan 2, 2026
* Revert "Revert "Add MoE prepare input kernels (sgl-project#29)" (sgl-project#57)"

This reverts commit eb9cfca.

Signed-off-by: Shaik, Kareem M <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants